Software Vault: The Gold Collection

home *** CD-ROM | disk | FTP | other *** search

/ Software Vault: The Gold Collection / Software Vault - The Gold Collection (American Databankers) (1993).ISO / cdr47 / pctuto.zip / DISK3.EXE / lha / CHAP10-1.DOC < prev next >

Wrap

Text File | 1990-06-24 | 31KB | 687 lines

78 CHAPTER 10 - TEMPLATES Do you remember when you were younger and you needed to look up a word in the dictionary? It would define the word in terms of a second word which you didn't know so you would look that up too. Most likely that second word was either defined in terms of a third word you didn't know or it referred you back to the first word. This chapter is something like that. The items in the template file are interdependent. If you're lucky, everything will be clear by the time you have finished the chapter. If not, you'll have to reread it. There are four different things which operate on the assembler instructions which you write - the ASSEMBLER, the LINKER, the LOADER and the 8086. 1) The ASSEMBLER takes your text and turns it into the machine code that is used by the 8086. It is complete except that the addresses of data and subroutines might change during linking and loading. The assembler generates information called HEADER files which give the LINKER and LOADER the information they need to update these addresses in the machine code. This means that you can move the code anywhere in memory. 2) If your program is made up of more than one file, the LINKER links them together. It then makes it ready for running. If there is only one file, the linker makes it ready for running. It does this by updating the addresses of anything it has moved. It still leaves the HEADER files which contain the segment addresses. 3) At run time, the LOADER, which is part of the operating system, decides where to put your program in memory. It loads the program, and adjusts any segment addresses in the program to reflect where the program actually is in memory. It then gives control to the program. 4) The code is fixed at the time the 8086 takes over. Any addresses are constants and are unchangable. Keep this in mind as we work through the template file. THE .LST FILE The first thing we need to look at is segments. Let's look at a slightly modified version of the template file called segs.asm. Here it is. ;*********************************** ; segs.asm ______________________ The PC Assembler Tutor - Copyright (C) 1989 Chuck Nelson Chapter 10 - Templates 79 ______________________ ; - - - - - - - - - - - - - STACKSEG SEGMENT STACK 'STACK' variable4 dw 4444h dw 100h dup (?) STACKSEG ENDS ; - - - - - - - - - - - - - MORESTUFF SEGMENT PUBLIC 'HOHUM' variable2 dw 2222h MORESTUFF ENDS ; - - - - - - - - - - - - - DATASTUFF SEGMENT PUBLIC 'DATA' variable1 dw 1111h DATASTUFF ENDS ; - - - - - - - - - - - - - CODESTUFF SEGMENT PUBLIC 'CODE' EXTRN print_num:NEAR , get_num:NEAR ASSUME cs:CODESTUFF,ds:DATASTUFF ASSUME es:MORESTUFF,ss:STACKSEG variable3 dw 3333h main proc far start: push ds sub ax,ax push ax mov ax, DATASTUFF mov ds,ax mov ax, MORESTUFF mov es,ax mov cx, variable1 mov variable1, cx ret main endp CODESTUFF ENDS ; - - - - - - - - - - - - END start ;*************************** There is an extra segment put in that has the definition MORESTUFF SEGMENT PUBLIC 'HOHUM' The PC Assembler Tutor 80 ______________________ There is a variable defined in each segment including the stack segment. These variables all have numbers in them, and the numbers are in hex so they will be easy to read. There are only two external subroutines (neither of which is called). It is time to take a look at an assembler listing. ----- THIS IS FROM THE SCREEN ----- C>masm segs Microsoft (R) Macro Assembler Version 5.10 Copyright (C) Microsoft Corp 1981, 1988. All rights reserved. Object filename [segs.OBJ]: Source listing [NUL.LST]: segs Cross-reference [NUL.CRF]: ----------- If you don't put a semicolon after the filename with masm, you get some prompts. The first asks you if you want the object file name to be different from the asm file name. You may change either the name or the name and the extension. If you don't want to change either, just press ENTER. The second asks if you want a listing. Normally you don't, so you just press ENTER. This time we do, so we give it the same name as the assembler file. The assembler will generate a file SEGS.LST. Finally, it asks if you want the information needed to create a cross-reference file. We won't cover that. Once again, press ENTER. The assembler generates an object file and a listing. Here's the complete listing. ********************** Microsoft (R) Macro Assembler Version 5.10 9/2/89 09:50:54 Page 1-1 ; segs.asm ; - - - - - - - - - - - - - 0000 STACKSEG SEGMENT STACK 'STACK' 0000 4444 variable4 dw 4444h 0002 0100[ dw 100h dup (?) ???? ] 0202 STACKSEG ENDS ; - - - - - - - - - - - - - 0000 MORESTUFF SEGMENT PUBLIC 'HOHUM' 0000 2222 variable2 dw 2222h Chapter 10 - Templates 81 ______________________ 0002 MORESTUFF ENDS ; - - - - - - - - - - - - - 0000 DATASTUFF SEGMENT PUBLIC 'DATA' 0000 1111 variable1 dw 1111h 0002 DATASTUFF ENDS ; - - - - - - - - - - - - - 0000 CODESTUFF SEGMENT PUBLIC 'CODE' EXTRN print_num:NEAR , get_num:NEAR ASSUME cs:CODESTUFF,ds:DATASTUFF ASSUME es:MORESTUFF,ss:STACKSEG 0000 3333 variable3 dw 3333h 0002 main proc far 0002 1E start: push ds 0003 2B C0 sub ax,ax 0005 50 push ax 0006 B8 ---- R mov ax, DATASTUFF 0009 8E D8 mov ds,ax 000B B8 ---- R mov ax, MORESTUFF 000E 8E C0 mov es,ax 0010 8B 0E 0000 R mov cx, variable1 0014 89 0E 0000 R mov variable1, cx 0018 CB ret 0019 main endp 0019 CODESTUFF ENDS Microsoft (R) Macro Assembler Version 5.10 9/2/89 09:50:54 Page 1-2 ; - - - - - - - - - - - - END start Microsoft (R) Macro Assembler Version 5.10 9/2/89 09:50:54 Symbols-1 Segments and Groups: The PC Assembler Tutor 82 ______________________ N a m e Length Align Combine Class CODESTUFF . . . . . . . . . . . 0019 PARA PUBLIC 'CODE' DATASTUFF . . . . . . . . . . . 0002 PARA PUBLIC 'DATA' MORESTUFF . . . . . . . . . . . 0002 PARA PUBLIC 'HOHUM' STACKSEG . . . . . . . . . . . . 0202 PARA STACK 'STACK' Symbols: N a m e Type Value Attr GET_NUM . . . . . L NEAR 0000 CODESTUFF External MAIN . . . . . . . . F PROC 0002 CODESTUFF Length = 0017 PRINT_NUM . . . . L NEAR 0000 CODESTUFF External START . . . . . . . . L NEAR 0002 CODESTUFF VARIABLE1 . . . . . . L WORD 0000 DATASTUFF VARIABLE2 . . . . . . L WORD 0000 MORESTUFF VARIABLE3 . . . . . . L WORD 0000 CODESTUFF VARIABLE4 . . . . . . L WORD 0000 STACKSEG @CPU . . . . . . . . . . . . . . TEXT 0101h @FILENAME . . . . . . . . . . . TEXT segs @VERSION . . . . . . . . . . . . TEXT 510 54 Source Lines 54 Total Lines 21 Symbols 48006 + 428261 Bytes symbol space free 0 Warning Errors 0 Severe Errors ********************** As you can see, the listing, even for a short program, is very long. Let's take it apart section by section. The first large section is a copy of the text file except that there is information on the left. The number on the far left tells the offset address (in hex) from the beginning of the segment for each label, variable or instruction. In this section: 0000 3333 variable3 dw 3333h 0002 main proc far 0002 1E start: push ds 0003 2B C0 sub ax,ax 0005 50 push ax 0006 B8 ---- R mov ax, DATASTUFF 0009 8E D8 mov ds,ax Chapter 10 - Templates 83 ______________________ 000B B8 ---- R mov ax, MORESTUFF 000E 8E C0 mov es,ax 0010 8B 0E 0000 R mov cx, variable1 0014 89 0E 0000 R mov variable1, cx 0018 CB ret 0019 main endp "start" is at 0002h ,"mov cx, variable1" is at 0010h and "ret" is at 18h. The second set of numbers is the actual machine instructions in hex. These are the what the 8086 operates on. "push ds" is 1E, "mov ds, ax" is 8E D8, and "ret" is CB. The instructions can be from 1 - 6 bytes long. Notice the "R" after some of the instructions. The "R" stands for relocatable. This means that it is an address that might be changed by either the linker or the loader. We'll talk about that later. In any case, the object file keeps track of these so they can be changed if necessary. Also, go back to the complete listing and look at the four variables; you will see that the values have been put in the object code; that is, 1111h, 2222h, 3333h and 4444h. If we had had an error, the assembler would have placed an error message at the spot of the error in this part of the file. The next part of the .LST file is the segment listing. It tells how the segments are defined. N a m e Length Align Combine Class CODESTUFF . . . . . . . . 0019 PARA PUBLIC 'CODE' DATASTUFF . . . . . . . . 0002 PARA PUBLIC 'DATA' MORESTUFF . . . . . . . . 0002 PARA PUBLIC 'HOHUM' STACKSEG . . . . . . . . . 0202 PARA STACK 'STACK' We have the segment name, length, and some other information we'll talk about later. Notice that 'HOHUM' which is an artificial class, is dutifully listed with no complaints. Then comes the listing of all labels, variables, and procedure names. Symbols: N a m e Type Value Attr GET_NUM . . . . L NEAR 0000 CODESTUFF External MAIN . . . . . F PROC 0002 CODESTUFF Length = 0017 PRINT_NUM . . L NEAR 0000 CODESTUFF External START . . . . . L NEAR 0002 CODESTUFF VARIABLE1 . . . L WORD 0000 DATASTUFF The PC Assembler Tutor 84 ______________________ VARIABLE2 . . . L WORD 0000 MORESTUFF VARIABLE3 . . . L WORD 0000 CODESTUFF VARIABLE4 . . . L WORD 0000 STACKSEG It shows the segment and offset, whether they are bytes, words, processes etc. The "L" stands for label. The variables and procedures which are in an external file are so marked. Neither print_num nor get_num was called, but the assembler maintains a listing for them. Finally, some internal info for the assembler. @CPU . . . . . . . . . . . . . . TEXT 0101h @FILENAME . . . . . . . . . . . TEXT segs @VERSION . . . . . . . . . . . . TEXT 510 We will come back to parts of the .LST file, so make yourself comfortable with it. SEGMENTS It is now time for the nitty-gritty. We need to know what all those statements in the template file mean. Remember that there are four players in the game - (1) MASM, the Microsoft assembler, (2) LINK, the Microsoft linker, (3) the program loader and (4) the 8086 chip itself. Who does what to whom is the subject of this chapter. You will notice that there are three segments in all the template files, one for data, one for code, and one for the stack. How many segments can a program have? An unlimited number for code, an unlimited number for data, and one for the stack.{1} Although you can have an unlimited number of segments, you can use only four at any one time - two for regular data (referenced by the DS and ES registers), one for code (referenced by the CS register), and one for temporary data (referenced by the SS register). You don't have direct control over CS. You should NEVER change the value in SS. This means that you can only change which segments that ES and DS refer to. How do you do that? The 8086 does not allow you to move a constant into a segment register. Therefore it is a two step process. Put the constant into an arithmetic register (AX, BX, CX, DX, SI, DI or BP) and from there to the segment register. Suppose we have 327 different data segments in our file (named SEG1, SEG2, SEG3 ... SEG327) and we wanted to reference data in SEG27. The code would be: mov ax, SEG27 mov ds, ax ____________________ 1 Although if you REALLY need more space for a stack it is possible, if a little arcane. Chapter 10 - Templates 85 ______________________ This is the standard way to do it, and this is the same as the fourth and fifth instructions in the code segment of the template files where we are putting the address of DATASTUFF in ds. What is that SEG27 in the instruction (mov ax, SEG27)? It is a constant. When the assembler assembles the program, it makes note of the fact that you want to have the starting address of SEG27 in that instruction (you saw the "R" in the listing for the instruction 'mov ax, DATASTUFF'). Later the linker makes sure there is a SEG27 segment in the complete program, gives it a temporary segment address, and puts this temporary address in every place that references that segment address. This address is guaranteed to be adjusted. You will see why when we look at the linker .MAP file. Finally, the loader (which is the program that puts your program into memory) puts the segment where it wants and updates all references to the segment address to reflect where it now is. Thus, the program is complete only when this information is put in at run time. Each time you run the program SEG27 might be in a different place, but the loader will always update the references correctly. We named the segments SEG1, SEG2, etc. Does SEG have to be part of the segment name? Not on your life. Here are three perfectly acceptable segment definitions: CURLY SEGMENT LARRY SEGMENT MOE SEGMENT It is good practice to have 'SEG' as part of the segment name to remind you that these are segments, not variables, but this is a practice only, it is not a law. Any name you could use for a variable or a label you could use as a segment name. The reserved word SEGMENT after the name tells the assembler that this is the beginning of a segment with that name. You tell the assembler that you are starting a segment with 'SEGMENT' CURLY SEGMENT and you tell the assembler that you are finished with that segment with the reserved word ENDS (END [of] Segment): CURLY ENDS You need to put the name of the segment before the ENDS directive. In the template file, the data segment definition reads: DATASTUFF SEGMENT PUBLIC 'DATA' DATASTUFF is the segment name, but what are PUBLIC and 'DATA' there for? To understand this, we need to look at the linker. First, let's assemble temp1.asm (our first template file) just the way it is. The PC Assembler Tutor 86 ______________________ ---------- FROM THE SCREEN ---------- C>masm temp1.asm Microsoft (R) Macro Assembler Version 5.10 Copyright (C) Microsoft Corp 1981, 1988. All rights reserved. Object filename [temp1.OBJ]: Source listing [NUL.LST]: temp1 Cross-reference [NUL.CRF]: ---------- We have made the listing file so let's look at the segment information. N a m e Length Align Combine Class CODESTUFF . . . . . . . . . . . 000A PARA PUBLIC 'CODE' DATASTUFF . . . . . . . . . . . 0000 PARA PUBLIC 'DATA' STACKSEG . . . . . . . . . . . . 00C8 PARA STACK 'STACK' You will see that CODESTUFF is Ah (10d) bytes long, DATASTUFF has no data and is 0 bytes long, and STACKSEG is C8h (200d) bytes long. Now let's link temp1.obj and asmhelp.obj. ---------- FROM THE SCREEN ----- C>link temp1+asmhelp Microsoft (R) Overlay Linker Version 3.61 Copyright (C) Microsoft Corp 1983-1987. All rights reserved. Run File [TEMP1.EXE]: List File [NUL.MAP]: temp Libraries [.LIB]: ---------- This time we have made a listing file for the link process. It is called TEMP.MAP. Let's look at it. Start Stop Length Name Class 00000H 000C7H 000C8H STACKSEG STACK 000D0H 00540H 00471H DATASTUFF DATA 00550H 01944H 013F5H CODESTUFF CODE Program entry point at 0055:0000 This is what the map file looks like. There are still only three segments in the final executable file, STACKSEG, DATASTUFF and CODESTUFF. You will notice that the class name is still there, but the PUBLIC is missing. It's job is finished. "Start" says where the segment starts in the executable file, "Stop" says Chapter 10 - Templates 87 ______________________ where the segment ends in the executable file, and "Length" says the length in bytes of the segment. These numbers are 5 digit hex numbers instead of 4. That means that they are showing the total address. The segment number is the left 4 digits of 'Start'. STACKSEG is C8h (200d) bytes long like before. Although DATASTUFF had no data, it is now 471h (1137d) bytes long, and CODESTUFF was Ah (10d) bytes long before but now it is a whopping 13F5h (5109d) bytes long. What happened? The linker did its work. One of the things the linker does is combine things that we want to be in the same segment. It took the DATASTUFF segment from temp1.obj and appended the DATASTUFF segment from asmhelp.obj, combining them into one larger segment.{2} It took the CODESTUFF segment from temp1.obj and appended the CODESTUFF segment from asmhelp.obj, making them one large segment. Why did it do that? Because we put the word "PUBLIC" in the segment definition. When the assembler sees "PUBLIC" in the segment definition, it passes that information along to the linker in a header file.{3} When the linker has a segment which is "PUBLIC", it will append any other segment which (1) is "PUBLIC", (2) has the same name (i.e. CODESTUFF or DATASTUFF or CURLY etc.), and (3) has the same class name{4}. All three things must be true for the linker to combine them. We will actually check this out a little later to make sure you believe it. One other thing to notice is that the linker is allocating only as much space as is needed. It could allocate 65536 bytes for each segment defined, but it uses only as much as the program needs and then starts the next segment at the next segment starting address. This is efficient management of memory. What is the advantage of combining the smaller segments into one larger segment? For code, there is no big advantage. But for data, remember that every time we want to access data, we need to have the starting address of that particular segment in register ds. We do this by using: mov ax, DATASTUFF mov ds, ax If we have a number of data segments, every time we access data ____________________ 2 The linker always works from left to right. For each different type of segment, it starts with the first one it finds and then appends each succeeding one it finds. 3 A header is information for the linker or loader which is put in front of the machine code in an object file or an executable file. There are typically a number of headers in front of the machine code. 4 Remember that class names are somewhat arbitrary. I use 'CODE', 'DATA' and 'STACK' for clarity and because they are the standard Microsoft class names, but if you are not linking with anyone else's programs, you can use any class name you want. The PC Assembler Tutor 88 ______________________ we need to (1) make sure that ds contains the address of the correct data segment, and (2) if not, we need to write the code to change ds. This entails using a lot of code, can be confusing and is certainly error prone. With one data segment, you simply load ds with the correct address at the beginning of the program and then forget about it. This should be a rule for you. Unless you have truly humongous amounts of data (over 65535 bytes), ALWAYS put all your data in the same segment. Do you remember those dashes '----' in the assembler listing? That was because the assembler didn't have a segment address to put there. 0004 B8 ---- R mov ax, DATASTUFF 0007 8E D8 mov ds,ax 0009 8B 0E 0000 R mov cx, variable1 000D 89 0E 0000 R mov variable1, cx The linker now has a temporary address for the start of DATASTUFF (000D0h) so it will put the segment address (the left four hex bytes) in this spot. This is temporary, but will be updated by the loader. If variable1 has been moved, it will update that too. Why am I sure that these temporary segments will be moved? The segment address of STACKSEG is 0000h. The segment address of DATASTUFF is 000Dh (13h) and the segment address of CODESTUFF is 0055h (85d). But the operating sysyem owns the first several THOUSAND segments. The loader will load your program in much higher memory. They must move. So the linker combines all the segments we want to combine, and then it looks at the machine code and modifies every reference to the segments and to the variables which have been moved. That is a lot of work. For instance, when the linker appends asmhelp.obj, there are a hundred or so variables which it moves and a thousand or so references to those variables which it modifies. The linker does that every time you link a file with ASMHELP.OBJ. That's not too shabby.